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ABSTRACT 



Confexf. Inversion techniques are the most powerful methods to obtain information about the thermodynamical and magnetic proper- 
ties of solar and stellar atmospheres. In the last years, we have witnessed the development of highly sophisticated inversion codes that 
are now widely applied to spectro-polarimetric observations. The majority of these inversion codes are based on the optimization of a 
complicated non-linear merit function. The large experience gained over the years have facilitated the recovery of the model that best 
fits a given observation. However, and except for the recently developed inversion codes based on database search algorithms together 
with the application of Principal Component Analysis, no reliable and statistically well-defined confidence intervals can be obtained 
for the parameters inferred from the inversions. 

Aims. A correct estimation of the confidence intervals for all the parameters that describe the model is mandatory. Additionally, it is 
fundamental to apply efficient techniques to assess the ability of models to reproduce the observations and to what extent the models 
have to be refined or can be simplified. 

Methods. Bayesian techniques are applied to analyze the performance of the model to fit a given observed Stokes vector. The posterior 
distribution, that takes into account both the information about the priors and the likelihood, is efficiently sampled using a Markov 
Chain Monte Carlo method. For simplicity, we focus on the Milne-Eddington approximate solution of the radiative transfer equation 
and we only take into account the generation of polarization through the Zeeman effect. However, the method is extremely general 
and other more complex forward models can be applied, even allowing for the presence of atomic polarization. 
Results. We illustrate the ability of the method with the aid of different problems, from academic to more realistic examples. We 
show that the information provided by the posterior distribution turns out to be fundamental to understand and determine the amount 
of information available in the Stokes profiles in these particular cases. 
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1. Introduction 

One of the most important breakthroughs in the interpretation 
of spectro-polarimetric observations has been the development 
and systematic ap plication of inversion techniques (see e.g., 
BellotRubioJ^006, and refer ences therein). They have allowed 
to extract as much information as possible from the observed 
Stokes profiles. A model that is assumed to be successful in de- 
scribing the astrophysical plasma that we are observing is pro- 
posed by the physicist. This model is defined by a set of pa- 
rameters, usually associated with interesting physical quantities. 
It can happen that these physical parameters are not direct ob- 
servables that can be obtained directly from the Stokes profiles. 
Then, inversion techniques try to adjust the parameters that char- 
acterize the selected model so that the emergent Stokes profiles 
reproduce, as good as possible, the observed profiles. 

The initial steps in the development of inversion codes were 
limited by the computational time. These inversion techniques 
often need the application of time-consuming non-linear opti- 
mization methods. For this reason, the first generation of inver- 
sion codes either used very simple models to reproduce the ob- 
served Stokes profiles or introduced some additional physical in- 
gredients in the inver sion scheme so that the complication s were 
highly reduced (e.g., lAuer et al.|[l977t iKeller et al.lll990l) . This 
is the reason why simple Milne-Eddington atmospheres (ME, 
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lAuer et aT | Il977t lLandi Degl'Innocenti & Landolfil |2004) have 
been widely applied for the retrieval of some kind of average 
magnetic field vector in the line formation region. Although the 
assumptions in which ME atmospheres are based may not be 
exactly fulfilled in the solar atmosphere, they have been exten- 
sively used. The reason is their inherent simplicity and the fact 
that there is an analytic expression for the emergent Stokes pro- 
files in terms of the physical parameters. 

A great leap forward was the development of inver- 
sion codes based on the concep t of response functions 
dRuiz Cobo & del To ro Iniesta| [r992l) . They have facilitated the 
inversion of Stokes profiles so that it is now possible to infer 
the vertical stratification of the thermodynamical and magnetic 
properties of the atmosphere if the information is present on 
the Stokes profiles. The presence of vertical variations along the 
line-of-sight of the physical properties are of importance for ex- 
plain ing the strong asymmetries observed in suns pots and facu- 
lae filling et alJll975ISanchez Almeida et al.lll989h . 

The development of such powerful and computationally ef- 
ficient inversion codes has led to an extensive number of ap- 
plicati ons to a large variety of sol a r atmospheric structures 
(e.g.. IWestendorp Plaza et all |1997t ISanchez Almeidal Il997t 
iLites et alJll998t uMathew et alj|200a iBommier et alj|2007l) . In 
spite of the success, it is important to be very cautious when 
using an inversion code. They essentially carry out the optimiza- 
tion of a given merit function with respect to a set of parameters. 
It is fundamental, however, to be cautious with several funda- 
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mental points. First, the number of free parameters cannot be as 
large as desired. The reason is that the amount of information 
available in the observed Stokes profiles might not be enough 
to constrain the value of many of these parameters. Reasons for 
this can be assigned to the presence of noise that mask the line 
profile dependence on certain parameters or the fact the Stokes 
parameters are completely insensitive to a parameter due to the 
intrinsic line formation process. Second, the parameters that we 
use to describe a given model might not be completely inde- 
pendent so that there exists (possibly nonlinear) combinations 
of these parameters that give rise to exactly the same emergent 
Stokes profiles. Among these degeneracies, we can find the well- 
known ambiguity associated with the projection of the magnetic 
field vector on the plane of the sky, the degeneracy between 
the filling factor and the longitudinal component of the mag- 
netic field strength (the magnetic flux density) in the weak-field 
regime and less-known degeneracies between thermodynamical 
and magnetic p arameters for magnetic field stru ctures organized 
in small scales (Martinez Gonzalez et al. 2006). Third, since the 
optimization problem is usually solved with the aid of gradi- 
ent descent methods like the Levenberg-Marquardt scheme, the 
solution given by the inversion code might not be that corre- 
sponding to the global minimum (in case a global minimum is 
present). 

Recently, several works have faced t his problem from a dif- 
ferent point of view. On the one hand, lAsensio Ramos! (2006) 
has introduced the usage of model selection algorithms for the 
interpretation of spectro-polarimetric observations. Given a set 
of possible models that can be used to describe the observa- 
tions, these algorithms help us select the most probable one 
using a quantitative approach. These algorithms, based on the 
Occam's Razor, favor models that better fit the observations with 
a reduced set of parameters, while disfavoring too complicated 
models even if they match the observation s or those that badly 
fit the observables. On the other hand, lAsensio Ramos et alj 
(2007b|) have applied algorithms based on geometrical consid- 
erations to estimate the intrinsic amount of information present 
in the Stokes profiles. The intrinsic dimension of the manifold 
in which the observables lie can be associated with the number 
of independent free parameters that can be used when propos- 
ing a model to describe the observations. They have also shown 
that the amount of information present in an observed dataset 
increases monotonically with the number of sp ectral fines in- 
cluded. Also of interest is the work carried out bv lSocas-Navarrol 
(2004b) for estimating the level of detail of the stratification of 
atmospheric parameters one can obtain from selected spectral 
lines. 

An important advance in the development of inversion codes 
was the application of database search algorithms in conjunc- 
tion with Principal Co mponent Analysis (PCA) to the inver- 
sion of Stokes profiles jRees et al.ll2000t lLopez Ariste & Casinil 
l2002tlSkumanich & Lopez Aristdf2002t ICasini et al.ll2005l) . For 
the moment, this inversion technique has been applied only to 
simplified ME atmos pheres and to microstructured magnetic at- 
mospheres (MISMA: TSocas-Navarro & Sanchez Almeidair2 002) 
but nothing (except for a computational problem) avoids using 
more complicated models. It is based on the direct comparison 
between the observed Stokes profiles and all the possible ones 
that can be built by varying the parameters that describe the 
model atmosphere. This comparison is not done with the pro- 
files themselves, but with the coefficients of the projection of 
both the observed and theoretical profiles into a given basis. The 
key point of the PCA inversion is that this basis set is obtained 
from the synthetic Stokes profiles themselves. Consequently, it 



already encodes valuable information about the line formation 
mechanism. The fact that we compare the observed Stokes pro- 
file with the whole database allows us to reach the global mini- 
mum, instead of getting stuck in local minima. A subproduct of 
using a database is that it is possible to define an error bar and 
give uncertainties into the inferred physical parameters. 

This paper addresses the development of an inversion 
scheme that allows to characterize the probability distribution of 
parameters of the model that better fit the observed Stokes pro- 
files. To this end, we adopt a Bayesian approach to infer the most 
probable values of the parameters and to extract their confidence 
levels. 



2. Bayesian Inversion of Stokes Profiles 

Our aim is to develop an inversion code that can obtain all the 
physical information present in the observed Stokes profiles and 
that can give us detailed statistical information. This statistical 
information allows us to estimate a real error bar for each param- 
eter and whether a parameter of a given model is constrained by 
the observables or not. This information turns out to be funda- 
mental so that one can trust the value of the inferred parameters 
and properly analyze the observations. 

2.1. Forward modeling 

The Bayesian formalism is extremely general and can be ap- 
plied to any model that explains a given set of observations. We 
are interested in the Stokes profiles emerging from a given atmo- 
sphere. Let S = (/, Q, U, Vy be the Stokes vector (f indicating 
transpose). The vectorial radiative transfer equation describes 
the variation along a given ray of the Stokes vector S depend- 
ing on the absorption and emission properties of the medium: 



as 



(1) 



where e = (e/, eg, en, ey) is the emission vector and K is the 
propagation matrix: 
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In principle, once e and K are known for all the points along 
the considered ray, it is possible to solve Eq. (HJ and ob- 
tain the synthetic emergent Stokes parameters. However, for 
simplicity, we will focus on the Milne-Eddington approxi- 
mation, although we plan to apply Bayesian inversion tech- 
niques to other more complex problems. Of interest is the 
case of inversion under local thermo dynamic equilibrium (LTE; 
Ruiz Cob o & del To ro Iniesta] Il992[) in which strong degen- 
eracies may be present (see iMartfnez Gonzalez et al.l 12006) 
and the case of scattering polarization and the Hanle effect 
with many (and even unknown) degeneracies ( House! 19771: 
Casini & Judge 1999; Truiillo Buenolll999l 120011: ICasini et al 
2005 |). In the ME approximation (see , e.g., lAuer et all |197 
Lan di DeglTnnocenti & Landolfi 2004), we assume that the ra 
tio between the line absorption coefficient and the continuum ab- 
sorption coefficient does not vary with depth in the atmosphere 
and that the line source function has a linear dependence on the 
optical depth along the line-of-sight. Furthermore, we assume 
that the magnetic field vector B and the bulk velocity are con- 
stant with depth. 
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Here we focus on the Zeeman effect as the mechanism 
that generates and modifies the polarization state of the at- 
mosphere. In this case, the elements of the propagation ma- 
trix and of the emission vector can b e easily calculated (e.g., 
lLandi Degl'Innocenti & Landoifil2 004). These elements depend 
on the strength of the magnetic field and on the specific orien- 
tation of the field vector with respect to the line-of-sight. After 
these assumptions, the well-known Milne-Eddington analytical 
solution of the radiative transfer equation can be applied. In the 
code, the effect of the magnetic field on the energy levels can 
be treated under the simple linear Zeeman regime, under the 
more general incomplete Paschen-Back regime or even hyper- 
fine structure can be included. 



confidence of the parameters. Consequently, degeneracies, am- 
biguities and the rest of problems that arise in typical inversion 
codes (except for those based on PCA) can be investigated in 
great detail. 

Let us analyze in detail the terms appearing in the right hand 
side of in Eq. OJ. As mentioned above, the prior distribution 
contains all the information that we know about the parameters 
without taking into account the observed data. In the most sim- 
ple case, we can assume that all the parameters are statistically 
independent, so that the prior distribution can be written as: 



m = Y\m\ 



(4) 



2.2. Posterior probability 

The interest to extract all the information available in the obser- 
vations has led to the systematic application of methods based 
on the Bayesian approach. A myriad of problems can be tack- 
led under this framework that has strong theoretical roots. We 
present the fundamental ideas of the formalism, although much 
more det ailed inform ation can be found in several monographs 
(see e.g.. lNeal|[l993l) . Let us assume a model M that is used 
to describe a given dataset D. In our case, the model M is the 
Milne-Eddington approximation. It is parameterized in terms of 
the vector of physical quantities 9 that contains the usual ME pa- 
rameters: doppler width of the line in wavelength units (A/ldoppX 
wavelength shift due to a macroscopic bulk velocity (v mac ), gra- 
dient of the source function (J3), ratio between the line and con- 
tinuum absorption coefficients (770), line damping parameter (a) 
and magnetic field vector parameterized by its modulus, incli- 
nation and azimuth with respect to a given reference direction 
(B, 9b and <f>s, respectively). It is customary to have some ini- 
tial information about the physical parameters. For instance, an 
estimation of the range of variation of the physical parameters 
might be available, although sometimes it can be a very rough 
one (for instance, a limitation to positive or negative values). 
This information is incorporated into a prior distribution p(9). 
When the information contained into the data D is incorporated 
in the problem, our state of knowledge of the parameters change 
according to the Bayes theorem: 



P (9\D) <x p(9)p(D\9). 



(3) 



The posterior distribution p(9\D) represents our state of knowl- 
edge of the parameters once the information of the dataset 
has been taken into account. The term p{D\9) is the so-called 
likelihood function and gives information about how well a 
particular set of parameters predicts the observed data. The 
Bayes theorem states that whether a model M becomes plau- 
sible after the data D has been taken into account depends 
on how plausible the model was before taking into account 
the data and how well the model predicts the data. The sim- 
plicity of the Bayes theorem hides all its potential and this 
kind of reasoning has led to a variety of appl ications: it has 
been widely used in cosmological analyses (e.g., [Lewis & Bridle] 
l2002t iRubifio-Martin et al J 120031: iRebolo et alJl2004 . gravita- 
tional wave analys es (e.g., [C ornish & Crowder 2005), gravita- 
tional lensing (e.g., Brewer & Lewis 2006), oscillation of solar- 
like stars (e.g.. lBrewer et al.ll2007 ) and many more. A powerful 
inversion code can be built based on the Bayes theorem. Once 
the posterior distribution p(9\D) is known, the position of the 
maximum value gives the most probable combination of param- 
eters that fit the data. Not only this, but we can also analyze the 



where the {#, } are the parameters included in the model and N p!lr 
is the number of such parameters. Unless physical information 
is available, we typically only know the range of variation of the 
parameters, so that we can write: 



p(9 i )=H(9 i ,9f D ,9™ x ), 

where H(x, a, b) is the top-hat function: 



H(x, a, b) ■ 



1 

b-a 





a < x < b 
otherwise 



(5) 



(6) 



As an example, consider the prior of a uniform magnetic field 
vector B. In order to guarantee such an uniform magnetic 
field vector, we have to sample uniformly the volume element 
dV = r 2 dr d(cos 9b) d<pB- We can assume that the magnetic field 
strength cannot be larger than fi max , so that its prior is a top-hat 
function that is non-zero in the interval [0, B^^]. When focusing 
on the solar atmosphere, a reasonable choice is B max » 4000 G 
so that all the physically relevant cases can be covered (quiet 
Sun, sunspots, faculae, etc.). The inclination and the azimuth 
have to be obviously limited to the ranges [0, n] and [0, 27r], re- 
spectively. If we neglect any correlation between the magnetic 
field strength, inclination and azimuth, the final prior on the mag- 
netic field vector is given by: 



P(fi) 



H{B\0,Bl 



c )tf(cos 9 B , -I, \)H{4>b, 0, 2tt). 



(7) 



Interestingly, it is possible to include correlations between the 
parameters. As an example, let us assume that the stronger the 
magnetic field, the more vertical it is. Additionally, weaker fields 
can be found with all kinds of inclinations. A simple prior distri- 
bution that fulfills the previous assumptions is: 



p(B) = ^{l+exp[-(B-B raax ) 2 /cr2]expf-^/^]} 



C 

x H(4> B ,0,2n), 



(8) 



where cr 2 and cr 2 g control the shape of the prior and C is a nor- 
malization constant. 

The second term, p(D\0), termed the likelihood, measures 
the probability that a model determined by a set of parameters 6 
fits a given observation D. For simplifying the notation, it is ad- 
vantageous to particularize to the case of the inversion of Stokes 
profiles. In spite of this particularization, the method remains 
still very general. The data D that we are facing consists on a set 
of four vectorial quantities, i.e., the wavelength dependence of 
the four Stokes parameters. The number of wavelength points in 
each Stokes profile is indicated by Na. The value of the Stokes 
parameter i = 0, 1,2,3 (that we associate with the more usual 
notation I, Q, U and V) at a wavelength Aj is represented by the 
quantity S° bs (Aj). When these Stokes parameters are observed 
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with a spectro-polarimeter attached to a telescope, they contain 
a certain level of noise. If these observational errors are inde- 
pendent and have a Gaussian distribution, their distributions can 
be described with their standard deviations cr^Aj), i.e., the noise 
level for each Stokes parameter at wavelength Aj. Strictly speak- 
ing, the noise in the observed Stokes profiles should be poisso- 
nian because it comes mainly from photon noise. However, for 
consistency with other works, we choose the noise to be nor- 
mally distributed, which will be a good approximation if the 
number of photons is high enough. Typically, we will deal with 
wavelength-independent noise, so that only the four quantities 
<7j are needed. Let S * be the Stokes parameters that emerge 
when the forward problem is solved in a given model M param- 
eterized by the vector of parameters 6. Taking into account the 
previous definit ions and assum ptions, the likelihood function is 
defined as (e.g., lMacKavll2003l) : 

p(D\6) oc e-^ x \ (9) 
where we have introduced the usual merit function^ 2 : 

1 (Aj) - Sl (Aj) \ 

4AT,Z^ at j 

Although we have focused on wavelength-independent noise, 
the formalism allows to accommodate wavelength-dependent 
noise by using cr,(/lj) instead of cr, in the likelihood. Therefore, 
if the information is available, it is possible to include other 
sources of uncertainty like reduction residuals, cross-talk, 
fringes, etc. The x 1 function that is typically used for the in- 
version of Stokes profiles presents the weights fw,-, i = 1 . . .4} 
for each Stokes parameter. This differential weighting scheme is 
not applied here, but the method can accommodate it straight- 
forwardly. The only influence of this weighting is to change the 
width of the maximum likelihood regions (reducing or expand- 
ing the confidence regions around the maximum). However, the 
location of the maximum is not changed. 

3. Markov Chain Monte Carlo 

In the Bayesian framework, the most plausible model is the one 
that maximizes the posterior distribution. Our objective is then 
to sample the posterior distribution and to find the combination 
of parameters that produce this maximum value. This will repre- 
sent the most plausible model that matches the observed Stokes 
profiles. For a small number of parameters N pm -, this brute force 
approach might be achievable. For instance, assuming that ten 
values per parameter are desired, something like lQ N f evalua- 
tions of the posterior distribution are needed. This implies that 
the forward model has to be evaluated a huge number of times. 
When iVpar < 5, such a direct approach can be of applicability 
in case the computing time per evaluation of the forward model 
is not very large. However, this brute force approach quickly be- 
comes impractical because the number of function evaluations 
increases exponentially with the number of free parameters. In 
order to overcome this difficulty, we have applied a Markov 
Chain Monte Carlo technique. Since this is the first time that 
such a method is applied to the inversion of Stokes profiles, 
we have decided to present in detail some important techni- 
cal issues in Appendix A, although they are widely known in 
other research fields. Briefly, our implementation of the Markov 
Chain Monte Carlo scheme is based on the Metropolis algorithm 
dMetropolis et all 1 195 3t lNeallll993l) . The proposal density dis- 
tribution is chosen to be a multi-variate gaussian with diagonal 



covariance matrix. O ur code uses the convergence criterium of 
iDunklev et al.l (120051) although other criteria are discussed in the 
Appendix. 

4. Illustrative examples 

In order to show the capabilities of the newly developed code, 
several examples are shown. Some of them deal with synthetic 
data where we can investigate the behavior of the method un- 
der controlled conditions. After these synthetic tests, we apply 
the code to a realistic case obtained from spectro-polarimetric 
observations. 

4. 1 . Simple academic example 

The first example serves as an illustration of how the MCMC 
method is able to capture the presence of degeneracies. To this 
end, a very simplified example is presented, where we make use 
of a Zeeman triplet line, namely the Fe i line at 630.2 nm. The 
emergent Stokes profiles are calculated in a Milne-Eddington at- 
mosphere. The value of the parameters are: Avd op p=2.4 km s , 

v'mac =0kms- 1 ,/3 = 9,770 = 9.8, a = 0.3, B = 100 G, 6 B = 45° 
and <ps = 0°. The main characteristic of such synthetic profiles is 
that the magnetic field strength is so weak (B — 100 G) that the 
Zeeman splitting is negligible compared to the Doppler width of 
the line. As a consequence, the emergent Stokes profiles can be 
described in the weak field regime of the Zeeman effect, in which 
the Stokes V profile is proportio nal to the wavelength deri va- 
tive of the intensity profile (e.g jLandi DeglTnnocenfilll992l) . It 
is widely known that only the line-of-sight component of the 
magnetic field vector (i.e., the product BcosO, with 9 the angle 
between the line-of-sight and the magnetic field vector) can be 
obtained from the amplitude of the Stokes V profile. On the con- 
trary, when linear polarization is also present, the precise mag- 
netic field vector can also be obtained. Since linear polarization 
appears as a second order contribution to the emergent signal, 
they are difficult to observe and a reduced noise level is funda- 
mental. In this section we present an analysis with the aid of the 
MCMC code on how the information retrieved from the previ- 
ously described synthetic Stokes profiles degrades with the pres- 
ence of noise. The noise is described by a Gaussian distribution 
parameterized by the value of cr, which is here given in units of 
the continuum intensity, I c . 

We run the MCMC code on the synthetic Stokes profiles 
taking into account the full Stokes vector for the calculation of 
the likelihood function given by Eq. (0. All the thermodynam- 
ical parameters and the azimuth are assumed to be known and 
we only allow the magnetic field strength and the inclination 
of the field to vary. We test that the obtained Markov chain is 
converged for the two parameters as indicated in Appendix A. 
For informative purposes, and although it depends on the com- 
plexity of the problem, our Markov chains require lengths of the 
order 50000 accepted samples to fulfill the convergence crite- 
rion, with a total computational time of the order of 20 seconds 
on a standard computer. Finally, taking into account that the ob- 
tained Markov chain is sampling from the posterior probability 
distribution, the posterior itself can be obtained simply by "mak- 
ing histograms". Two different cases with different amounts of 
added noise have been considered. A case in which the added 
noise level is cr = 10~ 5 , whose results are shown in Fig.Q] and 
a case with a much larger noise of <x — 10~ 3 , whose results are 
shown in Fig. [2] The two-dimensional histograms shown in the 
right panels of both figures present a graphical representation of 
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Magnetic field strength [G] Inclination [cleg] Magnetic field strength [G] 

Fig. 1. Posterior probability distribution for the simple academic example with a noise cr = 10~ 5 in units of the continuum intensity, 
I c . The full Stokes vector {I,Q,U,V) is taken into account. The left panel shows the marginalized distributions for the magnetic field 
strength while the central panel shows the distribution for the inclination. The right panel shows the two-dimensional posterior 
distribution. The contours indicate the confidence levels at 68% (solid line) and 95% (dashed line). 




50 100 150 200 250 300 350 20 40 60 80 50 100 150 200 250 300 350 

Magnetic field strength [G] Inclination [deg] Magnetic field strength [G] 

Fig. 2. Posterior probability distribution for the simple academic example with a noise cr = 10~ 3 in units of the continuum intensity. 
The posterior clearly shows a degeneracy between the magnetic field strength and the inclination. The dashed thin line in the right 
panel indicates the points where B cos 9b = B' cos ff B , where the primed quantities are those belonging to the synthetic profile, 
namely, fl'=100 G and 6^=45°. 




Line — of — sight magnetic field [G] 

Fig. 3. Posterior probability distribution of the line of sight com- 
ponent of the magnetic field, B cos 9b for the simple academic 
example with a noise cr = 10~ 3 in units of the continuum in- 
tensity. The posterior clearly shows a peak compatible with the 
original value. 



the posterior distribution of the magnetic field strength and in- 
clination, p(B, 6b)- We show two contours indicating confidence 
levels of 68% and 95%, respectively. The case cr = 10~ 5 shows a 
clearly peaked posterior distribution, indicating that a very good 
estimation of the magnetic field strength and inclination is pos- 
sible. On the contrary, the case cr = 1CT 3 presents a clear de- 
generation between both parameters, manifested by the typical 



"banana-shaped" posterior distribution. The main reason for this 
extended posterior distribution is that the Stokes Q and U sig- 
nals are masked below the noise level. For such a high noise 
level, only the information encoded in the Stokes V signal is 
available for retrieving the magnetic field strength and inclina- 
tion. Since the field is only 100 G, the line is in the weak-field 
regime so that only the product B cos 9b can be estimated from 
Stokes V. In order to make sure that this is indeed the case, we 
have overplotted the curve B cos 9b = B' cos 9' B with B' =100 G 
and 9' B =45°, which closely follows the shape of the posterior. 
Figure[3]shows the marginalized distribution of the line-of-sight 
component of the magnetic field, B cos 9b, showing that it can 
be recovered with accuracy. Marginalized posteriorly P(B) and 
P(9b) are also shown in the left and central panels of Figs. Q] 
and [2] Sharp distributions are found for the case with cr = 10~ 5 
noise, while distributions with enhanced tails are found for the 
case cr = 10~ 3 . Curiously, according to the marginalized dis- 
tributions, a somewhat good estimation of the field strength is 
possible even for this highly noisy profiles, although there is a 
non-negligible tail for larger field strengths. Concerning P(9b), it 
gives reduced information about the inclination, clearly showing 
the B cos 9b degeneracy. For comparison purposes, we have also 
applied an inversion code based on the Levenberg-Marquardt al- 
gorithm to estimate the parameters and their confidence intervals 
for the case with cr = 10~ 3 . The minimum of the^ 2 function is 
correctly obtained for B = 98.5 G and 9 B = 44.1°. However, 
the symmetric confidence intervals that we obtain using the di- 

1 They are obtained by integrating the two-dimensional histogram 
with respect to one of the variables. 
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agonal elements of the covariance matrix (e.g. JPress et al.l l986) 
produce an estimation of 98.5 + 180.9 G for the magnetic field 
strength and 44.1 + 100.5 degrees for the magnetic field incli- 
nation. According to the estimated error, the field inclination 
is not constrained by the observations. These results provide a 
poor estimation of the confidence intervals as compared to the 
marginalized posterior distributions shown in Fig. [2] 

The utility of this test is two-fold: on the one hand, we have 
demonstrated, with a simplified problem, the correct operation of 
the MCMC inversion code; on the other hand, we point out the 
obvious importance of having accurate Stokes profiles in order 
to recover information about the magnetic field vector. 

4.2. The problem of the quiet Sun 

After the presentation of a simple instructive example, we focus 
now on a more realistic problem that presents deep implications 
on the recovery of information about the magnetism of the quiet 
solar photosphere. The quiet Sun are those regions away from 
the most evident manifestations of magnetic activity. In the pho- 
tosphere, it corresponds mainly to the network (magnetic flux 
concentrations in the supergranular boundaries) and the inter- 
network (filling up the interior of supergranular cells). At the 
present spatial resolution of ground-based spectropolarimetric 
observations (0.5 - 1") the magnetic structur es on the quiet Sun 
are thought to be not spatially resolved (e.g..lSte nflo 1994l_ Linl 
19951; IDomfnguez Cerdena et alj 120031; iKhomenko etall 120031; 



Martinez Gonzalez et al. 2006). This has been demonstrated by 
Asensio Ramos et all ([2007a), presenting the first map of flux 
cancellation in the quiet Sun. The magnetism of the network is 
widely established as predominantly vertical kG structures fill- 
ing approximately 10-20% of the resolution element. However, 
the problem turns out to be more complicated in the internet- 
work, where the polarization signals that we can measure by 
means of the Zeeman effect are unresolved, occupying only the 
1 - 2 % of the resolution element. Typically the Stokes V pro- 
files have an amplitude of 10~ 3 in units of the continuum inten- 
sity, I c . The noise that we can achieve in the observational data 
(~ 10~ 4 I c ) is only one order of magnitude smaller than the po- 
larimetric signals in the internetwork. As it has been shown in 
the previous section, it is important to have a reduced noise level 
in order to obtain information about the magnetic field from the 
observed Stokes profiles. In the internetwork, when the widely 
observed pair of Fe 1 lines at 630 nm are used, no linear po- 
larization signal above the noise level is found with the cur- 
rent instrumentation. However, even if there is a lack of signal 
in Stokes Q and U, we could retrieve magnetic field strengths 
when the line is out from the weak field regime. In this partic- 
ular pair of lines and for the typical photospheric physical con- 
ditions, the line can be considered in the weak field regime for 
fields below ~ 600 G. This would mean that the kG magnetic 
field stren gths retrieved from this pair of lines would be reli- 
able (e.g..lGrossmann-Doerth et al.ll 19961: IS igwarth et al. 1999; 
IDomfnguez Ce rdena et al. 2003; Sanchez Almeida et al.ll2003l) . 

However, iMartfnez Gonzalez et al.l d2006l) have shown that 
this results should be regarded with care. They show the most 
simple case in which the thermodynamics compensates the ef- 
fect of a magnetic field. These authors used the SIR0 code 
dRuiz Cobo & del Toro Iniestall 19921) to synthesize the emergent 
Stokes profiles using the typical physical conditions of the in- 
ternetwork. The inversion of such profile with random initial- 
izations showed that the resulting atmospheres depended on the 



initialization itself if a noise level of 5 x 10~ 5 I c is assumed. In 
each case the change in the magnetic field was compensated by 
a small change in the magnetic temperature gradient (smaller 
than 300 K) and a slightly increase of the microturbulent ve- 
locity (below 1.5 km/s). The change in the temperature gradient 
produces a modification on the Stokes V ratio of the two spectral 
lines while the increase in the microturbulent velocity leads to a 
broadening of the line profile. This procedure clearly demon- 
strated the degeneracy of the inversion problem in this particular 
case. Unfortunately, the Levenberg-Marquardt algorithm used 
in the SIR code for the inversion of Stokes profiles does not 
produce a reliable and well-defined estimation of the errors in 
the p arameters that describe the at mosphere. This is the reason 
why Martinez Gonzalez et al. (2006) showed the degeneracy of 
the inversion problem by using repeated inversions with random 
initializations. In this paper, we follow the study performed by 
Mart inez Gonzalez et al.l (120061) and we extend it to the cases in 
which we increase the filling factor (we improve the signal to 
noise ratio) or we add a particular inclination to the magnetic 
field vector (we generate linear polarization signal). However, 
this time the solution is based on robust statistical techniques. 

The interpretation of the weak field regime in the 
case that the magnetic feature is resolved is straightfor- 
ward. In the weak field approximation, the radiative trans- 
fer equation has an analytical solut ion (see Chapter 9 of 
lLandi DeglTnnocenti & Landolfil2 004, for the conditions under 
which this approximation is valid). The Stokes V profiles can be 
written as: 



V(A) = -4.6686 x 10 _13 £^fi cos 9 



MX) 
dA ' 



(ID 



where g is the effective Lande factor of the line, Aq is the central 
wavelength of the spectral line given in A and B is the magnetic 
field strength given in G. The simultaneous observation of the 
Stokes / and V profiles allows us to compute the product B cos 9. 
The situation in the quiet Sun is not so straightforward since the 
magnetic structures occupy a very small portion of the resolution 
element. Then, the modelization of these areas requires at least 
two components: a magnetic component that gives rise to the po- 
larization signals and a non-magnetic one that accounts for the 
rest of the pixel that is field-frefl Two complicated problems 
arise due to this particularity. First, the right-hand side of Eq. 
(fTTT i has to be multiplied by the filling factor and the value that 
we will recover would be the longitudinal magnetic flux density 
aB cos 9. Second, the intensity profile that applies in Eq. dTTb is 
the one coming from the magnetic component. Then, the product 
aB cos 9 cannot be computed from the ratio of the Stokes V pro- 
file and the wavelength derivative of Stokes /, since the observed 
Stokes / is coming mainly from the non-magnetic component. If 
one still wants to use the previous approach, the only way to 
recover the product aB cos 9 would be by computing a calibra- 
tion curve. This means that we have to assume a model atmo- 
sphere and compute the Stokes V profile for different values of 
the longitudinal magnetic flux density. As a result, the inferred 
magnetic field is model dependent. In other words, the Stokes V 
profiles depend on the magnetic and thermodynamic properties 
so, if one wishes to infer the magnetic properties of the plasma, it 
is fundamental to fix the thermodynamical properties first. The 
only technique available to overcome this difficulty is to apply 



Stokes Inversion based on Response functions. 



3 The term field-free might result confusing since this component can 
indeed present a magnetic field that, due to its special structure, presents 
a zero Zeeman signal (e.g., microturbulent distribution, isotropic distri- 
bution, etc.). 
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Fig. 4. Two-dimensional posterior distributions for several combinations of the parameters for the internetwork synthetic example 
with a longitudinal magnetic flux density of 10 Mx/cm 2 . The contours indicate the regions where 68% and 95% confidence levels are 
placed. Large degeneracies are present in almost all the parameters, except for the gradient of the source function and the damping 
in the non-magnetic component. 
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Fig. 5. Two-dimensional posterior distributions for several combinations of the parameters for the network synthetic example with 
a longitudinal magnetic flux density of 200 Mx/cm 2 . The contours indicate the regions where 68% and 95% confidence levels are 
placed. The magnetic field strength can be recovered better than in the internetwork case shown in Fig. [4] However, increased 
degeneracies are also seen in the parameters of the non-magnetic component due to the reduced surface covered by this component. 



inversion techniques. However, one has to have in mind that the 
information encoded in the Stokes / profile (that has ~99 % con- 
tribution from the non-magnetic component) and in the Stokes V 
profile is not enough to constrain the problem and to recover in a 
trustable way all the atmospheric parameters in a two component 
model. 



4.2.1. Recovering the magnetic field strength in the 
internetwork 

In order to investigate in detail this last point, we deal with syn- 
thetic profiles that can be representative of the quiet Sun to see 
how well we can recover the magnetic field separately from the 
rest of the parameters. We synthesize the Fe i lines at 630.1 and 
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are placed. In spite of the presence of more information encoded in the linear polarization Stokes profiles, the marginalized posterior 
distributions show shapes very similar to those found in Fig. [4] 



630.2 nm using a two component model. Both atmospheres have 
the same values of the ME parameters except for the magnetic 
field strength and the filling factor. For informative purposes, the 
value of the parameters are: Avd op p=0.05 A , v mac = km s _1 , 
T]* 30 - 1 = 5.0, rf 301 = 4.5, a = 0.45 A , and/? = 8. The magnetic 
flux density is fixed to 10 Mx/cm 2 , representative of the typical 
value in the internetwork. Since the magnetic field strength is 
set to 1000 G and it has been assumed to be vertical, the filling 
factor of the magnetic component is set to 1%. A certain amount 
of noise, characterized by a normal distribution with a standard 
deviation of 10 I c , is added to the profiles. 

It is important to point out that, under the framework of a 
Milne-Eddington atmosphere, none of the parameters is strictly 
equivalent to the temperature or the microturbulent velocity that 
are present in the LTE approximation used by SIR. Accordingly, 
we select the gradient of the source function, /?, and the damp- 
ing coefficient, a, in both components together with the magnetic 
field strength and the filling factor as the free parameters in our 
test. Figure [4] summarizes the results of the Bayesian inversion. 
All the upper panels show the enormous degree of degeneracy 
between the magnetic field strength and the rest of parameters. 
The upper left panel indicates that magnetic fields with all values 
below 2000 G can reproduce the profile with an accuracy bet- 
ter than two times the noise level. Furthermore, magnetic field 
strengths between 400 and 1800 G fit the profile with a confi- 
dence level smaller than the noise level (see an example of two 
possible fits with different field strengths in Appendix B). An 
interesting behavior is shown in the central and right upper pan- 
els. The posterior distribution presents almost no variation along 
these directions (damping parameter and gradient of the source 
function) and they are only limited by the ranges that we have 
assumed for them. Therefore, this means that the data has pro- 
vided no new information for constraining these parameters (flat 
likelihood) and we are only recovering information about the pri- 
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Fig. 7. Marginalized distribution of the inferred magnetic field 
strength (solid line) together with the prior one (dashed line). 
Both distributions are very similar, making us consider that the 
information contained in the emergent line profiles is very small. 

ors. This is the typical example in which, due to the lack of in- 
formation, the result depends critically on the prior information 
and one should be very cautious with the conclusions inferred 
from the calculations. Finally, the lower left and central panels 
show the marginalized posterior distribution for the longitudi- 
nal magnetic flux density and the damping a and the gradient of 
the source function /?, respectively. Again we see that the line 
profiles carry reduced information about these parameters. Even 
more striking is the fact that the longitudinal magnetic flux den- 
sity is recovered with ~ 50% error at a 68% confidence level. 
On the contrary, the parameters of the non-magnetic component 
are recovered with precision, as stated in the lower right panel of 
Fig. [4] with differences with respect to the input values that are 
well below 2.5% in both cases. 
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The previous analysis demonstrates that it is difficult to ob- 
tain reliable information from Stokes profiles representative of 
internetwork regions. However, what happens in strongly mag- 
netized areas like the network, where the magnetic fluxes are 
10-20 times higher than in the internetwork? To investigate this 
issue, we use the very same ME parameters but we assume an en- 
hanced magnetic flux density of 200 Mx/cm 2 , where we have in- 
creased the filling factor of the magnetic component to 20%. The 
marginalized posterior distributions are shown in Fig. [5] In this 
case, both the magnetic field strength and the longitudinal mag- 
netic flux density are well recovered, together with the damping, 
a. However, it is interesting to point out that, as a consequence 
of the reduced filling factor of the non-magnetic component with 
respect to the internetwork case, the gradient of the source func- 
tion of the non-magnetic component presents a more extended 
posterior distribution. This behavior is easy to understand be- 
cause there is less information about the non-magnetic compo- 
nent encoded in the Stokes profiles produced by the large filling 
factor of the magnetic component. 

4.2.2. Inclined fields 

Coming back to the internetwork, it is of interest to investigate 
the shape of the marginalized posterior distributions when the 
magnetic field vector is inclined with respect to the line of sight. 
In this case, the information provided by the linear polarization 
profiles can lead to better constraints. We use the same synthetic 
profile with a magnetic flux density of 10 Mx/cm 2 and we as- 
sume inclinations of 20°, 45° and 70°. In the first case, the Stokes 
Q signal is below the noise level and it is not surprising that the 
results are comparable to the ones in which the magnetic field 
vector was assumed to be vertical. The case of an inclination 
of 70° shows the same behavior since, in this case, the Stokes 
V signal is below the noise level. Figure [6] shows the results 
of the inversion for the intermediate case of 9 — 45°. In this 
case, strong signatures of degeneracy are detected. The upper 
left panel shows that the magnetic field strength is concentrated 
in high values. However, contrary to what one could think, this 
does not mean that the value of the field strength is better recov- 
ered. First, we can see the large degeneracy with the other pa- 
rameters. Second, Fig. [7] shows that the marginalized posterior 
distribution for the magnetic field strength strongly resembles 
that of the prior. It seems that even the inclination angle cannot 
be constrained with the available information. One of the reasons 
is that, although the Stokes Q and U signal might become larger 
than the noise, the Stokes V signal decreases and gets closer to 
the noise. Therefore, some information available in Stokes V is 
hidden by the presence of noise. 

4.3. Realistic examples 

In order to demonstrate the capabilities of the MCMC code, we 
show an application to realistic Stokes profiles. They correspond 
to a positi on on an umbra of a sunspot obse rved during August 
17, 2004 dSainz Dalda & Lopez Aristell2007l) . The observation 
was carried out with the THEMIS telescope at the Observatorio 
del Teide (Spain). The telescope was operated in the MTR mode, 
so that the polarization analysis was performed for each wave- 
length at each pixel. Although the observation consisted on a 
scan over a sunspot, for the purpose of demonstrating the ca- 
pabilities of the MCMC code, we only focus here on the infor- 
mation obtained in one pixel of the whole scan. The observed 
spectral region contains the previously mentioned 630 nm pair 



of Fe i lines. Figure[9]presents, in solid line, the observed Stokes 
profiles. The noise level estimated from the continuum where no 
polarization signal is detected is of the orderof 1.6x10 3 in units 
of the continuum intensity. This spectral region consists on two 
Fe i lines at 630. 1 and 630.2 nm, together with two telluric con- 
tributions. The wavelength calibration has been carried out with 
the aid of the two telluric lines. The 630.2 nm line presents a 
higher magnetic sensitivity and this translates into an enhanced 
Zeeman splitting that can be also detected in the Stokes I profile. 

The MCMC code has been applied to both spectral lines sep- 
arately. We leave all the Milne-Eddington parameters free but 
we only focus on the results concerning the magnetic field vec- 
tor. Stray-light contamination from the surrounding quiet Sun 
is also taken into account. The results indicate a filling factor 
of the umbral component in the range 91-94%, with a confi- 
dence interval of the order of +3%. Figure [8] shows the results 
obtained from the inversion of the 630.2 nm Fei line. We show 
posterior probability distributions marginalized over all parame- 
ters except for one and except for two. The results shown in Fig. 
[8] indicate that the information encoded in the observed data is 
enough to constrain the characteristics of the magnetic field vec- 
tor. Except for the case of the azimuth of the field, the marginal- 
ized one-dimensional probability distribution functions present 
an asymmetric non-gaussian shape, with extended wings. The 
parameters are nicely constrained by the observations and the 
inferred values are given in each plot, together with the 68% 
confidence interval. Concerning the two-dimensional distribu- 
tions, we show them as contour plots, where the 68 and 95% 
confidence levels are indicated. 

Concerning the Fe i line at 630. 1 nm, the results are defini- 
tively worse. The posterior distributions are much broader than 
for the 630.2 nm line and they present strong degeneracies. 
Several points deserve a more profound discussion. First, the 
elongated shape of the p{B, 9b) posterior indicates a certain de- 
gree of degeneracy between both parameters. The reason for 
this behavior is that the 630. 1 nm line is still in the transition 
from the Zeeman weak-field regime to a saturation regime. As 
a consequence, the B-9b degeneracy that we have discussed in 
ij4.1| introduces problems in the unique determination of the 
field strength and inclination. Second, it is important to point 
out the fact that these results have been obtained assuming 
<x = 1.5 x 10 3 in units of the continuum intensity. This is a rel- 
atively large value which poses a relaxed tolerance in the quality 
of the fit, thus resulting in increased tolerance in the inferred pa- 
rameters. Clearly, due to the different magnetic sensitivity, noise 
affects differently to both spectral lines. Since the Zeeman split- 
ting in the 630.2 nm line is clearly visible, the information about 
the magnetic field strength is readily available from the peak sep- 
aration in the Stokes V profile. This separation is much less af- 
fected by noise. Once the field strength is fixed, the inclination 
and azimuth of the field are easily obtained. Contrarily, since the 
630. 1 nm line is partially in the weak-field regime, the magnetic 
field strength has to be obtained from the amplitude of the Stokes 
V profile, together with the rest of Stokes parameters. The esti- 
mated value of the field strength crucially depends on the value 
of the tolerance cr. As a proof of this, we have verified that the 
shape of the p{B, 9b) surface shown in Fig. [10]approximately fol- 
lows cos 9b °c l/B and that the width is related to the tolerance 

cr. 

The results of both inversions should also be regarded in con- 
j unction, as shown in Fig . QT| If the line formation region of both 
lines would have been exactly the same, one would expect to find 
equivalent results from both lines. Since this is not the case (e.g., 
Shchukina & Truiil lo Buenoll200ll) . some differences might ex- 
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Fig. 8. One-dimensional and two-dimensional marginalized posterior distributions for the parameters that define the magnetic field 
vector for the inversion of the Fe i line at 630.2 nm. The distributions clearly indicate the presence of a peak, belonging to the value 
of the parameters that produce the best fit. We have indicated in the one-dimensional posteriors the most probable value of each 
parameter (large arrow), together with the 68% confidence interval (small arrows). Note the presence of asymmetric confidence 
intervals. The two-dimensional posteriors have been represented with contours indicating the confidence levels at 68% (solid line) 
and 95% (dashed line). 




Fig. 9. Stokes profiles observed in the umbra of a sunspot (solid lines). The well known 630.2 nm Fe i line presents a larger magnetic 
sensitivity that translates into an enhanced Zeeman splitting. The noise in the observations is of the order of 1 .5x 10~ 3 in units of the 
continuum intensity. The symbols show the fit obtained when taking the most probable value of the parameters inferred from the 
Markov Chain. 
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Fig. 10. Same as Figure[8]but for the Fe i line at 630.1 nm. The two-dimensional representations of the posterior distribution clearly 
indicate the presence of a degeneracy between the inclination and the magnetic field strength. This translates into very broad one- 
dimensional marginalized distributions. 




ist. In spite of this, the posterior distributions clearly overlap in a 
region of the space of parameters that describe the magnetic field 
vector. The results clearly demonstrate that the combination of 
the two lines produces a slight improvement on the restriction 
of the parameters. However, the result is very similar to what 
we find using only the line at 630.2 nm. The field azimuth is 
compatible with a value of 217.4+™ degrees. It is important to 
point out that, for the purpose of showing a less crowded plot, we 
have restricted the range of variation of the azimuth arbitrarily 
to [7t,27t], thus avoiding the presence of ambiguities. However, 
we have verified that the code is able to correctly capture the 
intrinsic azimuth ambiguity when the range of variation is set 
to [0, 2n]. The field inclination given by both lines is consistent 
with 45.4^ 2 degrees, while the magnetic field strength is con- 
sistent with 1768. 6+ ™» G. Figure QT] shows what we consider 
one of the most appealing properties of the Bayesian method for 
the inversion of Stokes profiles that we are presenting in this pa- 
per. It is possible to assess the amount of information given by 
one spectral line individually and combine many lines in order 
to investigate whether the added information helps in better con- 
straining the model parameters. 



Our results tend to indicate that the information obtained 
from the 630.1 nm line alone is very reduced and that it can 
hardly be used to restrict the magnetic field vector for the noise 
level that we have in the observations. As another exercise, we 
have inverted both lines simultaneously following the very same 
scheme as that presented above. We do not show a graphical 
representation of the results because they are very similar to 
those inferred from the 630.2 nm line which can be found in 
Fig. [8j as also suggested by Fig. QT| At the light of the re- 
sults presented here, it is desirable to accumulate information 
from many spectral lines, with the hope that the combin ed ef- 
fect helps us to better constrain the physical parameters dSemell 
ll98lHSocas-Navarroll2004atlAsensio Ramos et al.ll2007bl) 

5. Concluding remarks 

The framework that we have presented here is of very gen- 
eral nature and allows its application to any existing Stokes in- 
version code. Once a model that can be used to calculate the 
emergent Stokes profiles is available, the MCMC method can 
be used to efficiently explore the posterior probability distri- 
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bution function. Presently, we are witnessing an enormous in- 
put of Stokes profiles observatio ns from existing ground- based 
instrumentation like THEM IS dLopez Ariste et al] 12 000), TIP 
dMartfnez Pillet et al.lfl999h and POLIS dBeck et all 120051) and 
with the space-based instrumentation like the recent mission 
HINODE. The pressure will be even larger once the new gen- 
eration of big solar telescopes like GREGOR and ATST arrives. 
Therefore, a huge effort is been put into developing fast inver- 
sion codes that can cope with s uch an amount of observations. 
Inversion codes bas ed on PCA (Rees et al]|2000l) and artificial 
neural networks dSocas-Navarrol 2005b are good candidates for 
such a demanding work. 

Our approach here has a completely different point of view. 
We understand that Bayesian inversions cannot compete in speed 
with these fast algorithms (they cannot even compete with stan- 
dard inversion codes based on Levenberg-Marquardt optimiza- 
tion). However, the Bayesian approach is the only one that can 
be used to investigate in detail the accuracy of inversions, the 
sensitivity of the parameters to the noise and give confidence 
intervals to all the inferred parameters. Furthermore, it can be 
used to rule out a given model for its lack of ability to fit a 
given observed Stokes profile. It is also important to point out 
that our approach can make use of the we ll-developed machin- 
ery behind th e Bayesian formalism (e.g.. iMarshall et al]|2006t 
Liddlel 120071) . For instance, model selection techniques based 
on the calculation of the e vidence can be introdu ced. Similarly 
to the results presented bv lAsensio Ram os (2006), the simplest 
model that better fits the observations is preferred with respect to 
more complicated models (even if they produce a slightly better 
fitting). 

In spite of the intrinsic high computational load of the 
MCMC method, one of its advantages is that it is easily par- 
allelizable. Many Markov chains can be run simultaneously in 
different isolated threads with no communication between them. 
Once the chains are finished, they can be combined into a large 
chain. Since each Markov chain (after the burn-in period) is sam- 
pling from the posterior distribution, we end up with a very large 
chain that also samples from the posterior distribution. Except 
for the presence of a burn-in period in each chain, the gain in 
computational time is roughly proportional to the number of 
threads. A more refined way of parallelization is to start a chain 
and, after the burn-in period, subdivide it into different threads. 
At the end, all the threads are combined and we end up with a 
long chain. In this case, the gain in computational time is slightly 
larger than in the previous case. 

The inversion code for the Milne-Eddington case (Bayes- 
ME) is made available after contact with any of the authors. The 
present version of the code is extremely versatile and it presents 
a very good convergence rate. However, we plan to introduce 
different refinements in the future. The most straightforward is 
the modification of the proposal density so that non-diagonal 
elements of the covariance matrix can be taken into account. 
Although the convergence rate assuming a diagonal covariance 
matrix is acceptable, this refinement can lead to a reduction in 
the length of the chains because more underlying structure of 
the posterior distribution is captured in the proposal density. 
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Appendix A: Markov Chain Monte Carlo 

A. 1 . Metropolis algorithm 

The idea of this approach is to directly sample the posterior 
distribution using a Markov Chain. The elements of the chain 
are the vector of parameters that are used to describe every 
model. The Markov Chain is a stochastic process {0q, 0\,...,9 n } 
in which every element 0, only depends on the previous one . 
The key idea of the MCMC method is to choose the next point in 
the chain depending on the previous point such that the distribu- 
tion of the chain asymptotically tends to be equal to the posterior 
distribution, i.e.: 



lim p({0 ,0 u .,.,0 n })=p(0\D). 



(A.l) 



Several methods are available although we will focus o n the 
Metropolis algorithm ([Metropolis et alj|1953t lNeallll993l) that, 
in spite of its simplicity, gives extremely good results. The algo- 
rithm can be defined as follows: 

1. Choose a starting vector of parameters 0q. If some informa- 
tion is available about the value of some of the parameters, 
it is advantageous to start close to the solution. However, 
this condition is not mandatory for the convergence of the 
Markov chain. 

2. Calculate the posterior probability given the data p(0o\D). 
This includes the calculation of the priors and the likelihood 
(including the calculation of the forward modeling problem). 

3. Obtain a new vector of parameters 0, sampling from a pro- 
posal density distribution q(0j\6j-\). We will explain this step 
more in detail afterwards. 

4. Evaluate the posterior probability p(8i\D). 

5. Evaluate the ratio 



piOi-mqiOi-im' 

Admit 0, in the Markov Chain with probability 



P = min[l, f\. 

If a point is rejected, include 
6. Go back to step to 3. 



(A.2) 



(A.3) 



in the chain. 



It has been shown that the previous numerical scheme leads to a 
Markov Chain whose probabili ty distribution converge s towards 
the posterior distribution (e.g., Metro polis et al][l953l) . The ad- 
vantage with respect to the brute force approach is that the num- 
ber of evaluations of the posterior distribution is no more ex- 
ponentially increasing with the number of parameters, but lin- 
early. As a consequence, we can treat much more complicated 
problems with a reduced computational effort. The reason for 
this behavior is that since the chain is sampling the underly- 
ing posterior distribution, the regions of larger probability are 
evaluated more times. It is important to point out that the pro- 
posal density distribution is usually chosen to be symmetric, thus 
q(0i\0j-\) = q(0i-\\8i). As a consequence, the ratio that have to 
be evaluated at step 5 simplifies to r — p{6i\D)/ p(0^\\D). 



A.2. The proposal density 

The key ingredient of the Metropolis MCMC algorithm is the 
proposal density. In the ideal case, one should choose <?(#,■ |#,_i) 
as close to the posterior distribution as possible. In the limiting 
case that the proposal distribution is exactly matching the pos- 
terior one, one is carrying out a perfect sampling: more samples 
are performed in the regions of larger probability. Consequently, 
all the proposed steps will be included into the Markov Chain. 
This case is obviously unrealistic because it assumes that our 
aim (i.e., the evaluation of the posterior distribution) has been 
already achieved. 

The power of the MCMC scheme lies in the fact that, even 
naively chosen proposal densities lead to an algorithm that ef- 
ficiently samples from the posterior distribution. However, it is 
also true that a smart election of the proposal density greatly im- 
proves the convergence rate of the algorithm. Common proposal 
densities include gaussian, normal or uniform distributions cen- 
tered at the current value of the parameters to propose a new 
value of the parameters. In our case, we have chosen a combi- 
nation of gaussian and uniform distributions. Both cases lead to 
a symmetric proposal density. For the initial N, m if steps of the 
chain, we have decided to propose parameters following a uni- 
form distribution in each parameter. The limits of the uniform 
distribution are free parameters chosen to be equal to their range 
of variation. The minimum values for all the parameters are put 
into the vector mm while the maximum values are included into 
the vector raax . Then: 



qmOi-x) ~ ^(rr m ,<r ax ). 



(A.4) 



After the first 7Y un if steps, some information about the poste- 
rior probability is known. Therefore, statistical properties like 
the covariance matrix C can be estimated. At this point, we 
change to a gaussian proposal density centered at the current 
value of the parameters. Ideally, one should propose with the 
following distribution: 



q{0i\0i-\) ~ exp 



2 



(A.5) 



where u = 0, ■ - stands for the transpose of the u vec- 

tor and a is a constant whose meaning will be discussed later. 
Sampling from such a proposal density would require the diag- 
onali zation of the covarian ce matrix due to the matrix inversion 
(e.g.. iDunklev et ai]|2005l) . This proposal density is very use- 
ful for problems in which strong degeneracies are present in the 
problem, so that the posterior distribution shows very elongated 
maxima. However, in the first version of our inversion code, we 
neglect the non-diagonal elements of the covariance matrix. We 
have verified that this approximation gives extremely good re- 
sults in our case (in spite of the degeneracies present in the prob- 
lem). The inclusion of non-diagonal terms in the covariance ma- 
trix is left for future revisions of the code. 

When we only take into account the diagonal elements of the 
covariance matrix, the proposal of each parameter can be done 
independently of the rest of parameters. Random numbers fol- 
lowing a normal distribution with unit variance are picked and 
the proposed value for each parameter is obtained by multiply- 
ing them with their corresponding variances. The variances are 
updated after a fixed number of iterations of the Markov Chain. 
It is not necessary to use the whole Markov Chain to estimate the 
variances, because the following updating rule can be applied to 
update the variance of parameter i at step n: 



aj(ri) 



1 



-crf(n-l) 



(A.6) 
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[OM-Oiin-l)] [OM-Oiin)] 



(n+ l) 2 



(A.7) 



where #,(«) is the value of the proposed parameter, 6,{n - 1) 
stands for the average of the parameter i taking into account the 
first n - 1 elements of the chain, while 0,-(n) takes also into ac- 
count element n in calculating the average. The average can also 
be updated following the rule: 



8i(n) = 6i(n - 1) + 



6M - Bj{n - 1) 
n + 1 



(A.8) 



Concerning the constant a, it is used to tune the conver- 
gence process. It ha s been demonstrated ( Gel man et al.1 1 996 : 
iDunklev et al.| [2005) that, in order to efficiently sample from a 
posterior distribution, the acceptance rate of models should be 
of the order of 25%. We use a to shrink or broaden the pro- 
posal density so that such an acceptance rate is assured. We have 
verified with an extensive test phase that this technique behaves 
nicely and the chain rapidly samples the posterior distribution. 



Appendix B: Profiles 

According to Fig.g] fields above 500 G and below 1800 G fit the 
synthetic profile with added noise with a precision smaller than 
lcr. When this constraint is relaxed to 2<x, the fields can be even 
larger or smaller. Using the three plots of the upper panel of Fig. 
|U it is possible to detect a large number of combinations where 
fits inside the 68% confidence level can be obtained with sub-kG 
and kG fields. As an example, we show in Fig. IB. II a fit to the 
synthetic Stokes profiles with added noise with a field of 600 G 
and with a field of 1500 G. There is no objective reason to prefer 
one fit over the other under a lcr uncertainty, as consistent with 
the results presented in Fig. [4] Note that this result w as pointed 
out for the first time by Mart inez Gonzalez et al.l (120061) . 



A.3. Convergence 

The convergence of the Markov Chain is a c ritical issue (e.g., 
iGelman & Rubinlll992l: iLewis & Bridlell2002l) . A chain is said 
to be converged when the statistical properties of its elements 
reflect with "enough accuracy" the statistical properties of the 
underlying distribution that is being sampled. A problem arises 
for what "enough accuracy" means. Great efforts have been 
put into the developm ent of powerful convergence tests (e.g., 
IGelman & Rub in 1992). The key ingredient in dictating the con- 
vergence rate is the proposal density distribution. One of the 
most wide ly applied methods of c onvergence test is the one pro- 
posed by Gelman & Rubin! (Q992). The main drawback is that it 
works by generating several Markov chains with random initial 
points. A posterior analysis of their statistical properties helps us 
to distinguish when a chain is sampling from the posterior dis- 
tribution. At this point, the elements of the chain can be used to 
obtain information about the statistical properties of the posterior 
di stribution that we are sampling. Our code uses the alternative 
of IDunklev et al.l d2005) for testing for convergence. It is based 
on the idea that the Fourier power spectrum of a the Markov 
chain would be flat and equal to the variance of the underlying 
distribution in case complete convergence is obtained. However, 
the chain can b e considered as conv erged in much less restrictive 
conditions (see IDunklev et al.ll2005l for details). 

At the beginning of the MCMC algorithm, the chain typ- 
ically proposes large jumps through the parameter space until 
the regions of high posterior probability distribution are located. 
This is specially true when the initial point of the chain is very 
far away from the regions of large posterior density. The chain, 
once migrated to these regions, proposes smaller jumps. The ini- 
tial steps of the chain are not representative of the underlying 
posterior p(0\D). They are usually known as the "burn-in" of the 
chain and these elem ents are typically thrown away. Following 
IDunklev et al.l d2005l) . one easy way to locate the number of el- 
ements of the "burn-in" is to locate the maximum value of the 
posterior p m . dx and discard the first elements of the chain until 
p(0\D)/p mdx > /, with / ~ 0.1 - 0.2. When the initial point 
of the chain is close to the high probability region, this scheme 
leads to a "burn-in" of a few (or even zero) elements. 
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Fig. B.l. Synthetic profiles with added noise (diamonds) together with two different synthetic profiles corresponding to models 
presenting magnetic field strengths differing by 900 G (solid lines). Since both models fit the profiles below lcr, there is no objective 
reason to favor one of them. 



